MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  8UREAU  Of  SfANDARDS-]9S3-A 


MRC  Technical  Summary  Repm  i  II. !(■?/> 


i<i::;ri)i)Ai.r.  tn  non  link  ah  Rur.wiSSioN 


R.  D.  Cook  and  C.  L.  Tsai 


<D 


Mathematics  Research  Center 
University  of  Wisconsin— Madison 
610  Walnut  Street 
Madison,  Wisconsin  53705 

January  1984 

(Received  November  3,  1983) 


*  Sponsored  by 

U.  S.  Army  Research  Office 
P.  O.  Box  12211 
Research  Triangle  Park 
North  Carolina  27709 


OIK  FILE  COPY 


DT1C 

AELECTEf| 

^MAR21B84jl 


Approved  for  public  release 
Distribution  unlimited 


03  21  084 


UNIVERSITY  OF  WISCON8IN-MADISON 
MATHEMATICS  RESEARCH  CENTER 


RESIDUALS  IN  NONLINEAR  REGRESSION 
R.  D.  Cook  and  c.  L.  Tsai* 


Technical  Sununary  Report  #2625 
January  1984 


sr-.  ABSTRACT 

77  ~ 

—We-  employ  a  quadratic  expansion  to  investigate  the  behavior  of  the 

f/i/ 

ordinary  residuals  in  nonlinear  regression.  In  particular, /We  derive 
quadratic  approximations  for  the  mean  and  variance  of  the  ordinary  residuals, 
and  the  covariances  between  the  ordinary  residuals  and  the  fitted  values. 

This  investigation  leads  to  the  conclusion  that  the  ordinary  residuals  can 
produce  misleading  results  when  used  in  diagnostic  methods  analogous  to  those 
for  linear  regression.  Consequently,  we  suggest  a  new  type  of  residual  that 
overcomes  many  of  the  potential  shortcomings  of  the  ordinary  residuals .  »  ■*  5  • 

/!  J. 


AMS  (MOS)  Subject  Classification:  62J02 

Key  Words:  Diagnostics,  intrinsic  curvature  array,  nonlinear  regression, 
residuals 

Work  unit  Number  4  (Statistics  and  Probability) 


‘Department  of  Statistics  and  Operations  Research,  New  York  university. 

New  York,  NY  10006 

Sponsored  in  part  by  the  United  states  Army  under  Contract  No.  DAAG29-80-C-0041 . 


SIGNIFICANCE  AND  EXPLANATION 

Statistical  methods  for  the  analysis  of  experimental  data  are  necessarily 
dependent  on  the  specification  of  a  model,  a  mathematical  formula  that 
describes  the  behavior  of  the  data  up  to  a  few  unknown  parameters.  Generally, 
a  model  can  be  visualized  as 

Datum  (D)  -  Systematic  component  (S)  +  Random  component  (R) 
or  in  abbreviated  form  D  ■  S  +  R.  The  specification  of  a  model  often 
involves  making  assumptions,  such  as  "the  data  are  normally  distributed,"  that 
may  have  little  prior  substantive  support*  Consequently,  it  becomes  necessary 
to  use  the  data  to  assess  the  adequacy  of  the  model.  Such  assessments  are 
extremely  important  in  statistical  analyses  since  erroneous  assumptions  can 
lead  to  erroneous  conclusions  (the  mistaken  conclusion  that  a  drug  is  not 
carcinogenic  could  have  devastating  results). 

Models  that  are  linear  in  the  unknown  parameters,  6 0p,  have 
systematic  components  that  can  be  expressed  as 

S  -  X1e1  +  V2  +  •••  +  Vp 

where  the  X^'s  are  nonrandom  experimental  variables  whose  values  are  known. 
Many  methods  of  assessing  model  adequacy  are  available  for  such  linear 
models.  However,  relatively  little  is  known  about  how  to  assess  model 
adequacy  when  S  is  nonlinear  in  the  parameters;  for  example 
S  •  exp{x101  +  ...  +  xp0p)«  The  purpose  of  this  paper  is  to  provide  a 
foundation  for  the  development  of  methods  for  assessing  the  adequacy  of  models 
that  are  nonlinear  in  the  parameters. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


RESIDUALS  IN  NONLINEAR  REGRESSION 
R.  D.  Cook  and  C.  L.  Tsai* 

1.  INTRODUCTION 

Diagnostic  methods  ara  ussful  for  assassing  ths  adequacy  of  asauaiptiona  underlying  the 
aodeling  process  and  for  identifying  unexpected  characteristics  of  the  data  that  May 
seriously  Influence  conclusions  or  requira  special  attention.  It  is  widely  held  that  the 
diagnostic  phase  is  an  important  part  of  any  regression  analysis. 

A  variety  of  diagnostic  Methods  are  available  to  aid  in  analyses  based  on  linear 
regression  models  (Cook  and  Neiaberg  1982  provide  a  review).  For  the  most  pert,  the 
development  of  these  methods  Is  dependent  on  a  thorough  study  and  characterisation  of  the 
exact  small  sample  behavior  of  a  few  fundamental  building  blocks  such  as  the  ordinary 
residuals  and  related  statistics,  lhe  interpretation  of  standard  residual  plots,  for 
example,  depends  on  the  knowledge  that  the  expectations  of  the  residuals  are  sero  under  a 
correct  model. 

In  more  complicated  settings  such  as  nonlinear  regression,  the  exact  small  sample 
behavior  of  the  corresponding  building  blocks  is  generally  intractable  so  that  some  degree 
of  approximation  is  necessary.  In  addition,  the  nonlinear  regression  problem  involves  new 
concerns  that  do  not  have  counterparts  in  linear  regression  and  thus  that  may  require  the 
development  of  new  diagnostic  methods. 

Diagnostics  for  nonlinear  regression  can  be  constructed  by  using  first-order 
extensions  of  analogous  methods  for  linear  regression  (see,  for  example.  Cook  and  Vieisborg 
1982).  Generally,  these  diagnostics  are  based  on  the  assumption  that  the  usual  tangent 
plane  approximation  to  the  solution  locus  is  adequate, so  that  the  nonlinear  model  is 
essentially  linear  in  a  neighborhood  of  the  estimated  parameters.  Nhile  such  diagnostic 
methods  are  certainly  useful  as  first  approximations  and  will  often  provide  important 
information,  a  deeper  analysis  nay  be  required  for  an  adequate  understanding  of  nonlinear 
regression. 

•Department  of  Statistics  and  Operations  Rssearch,  New  York  university.  New  York,  NY  10006 
Sponsored  in  pert  by  the  United  states  Army  under  Contract  No.  DAAG29-80-C-0041. 


In  this  paper,  we  investigate  properties  of  the  ordinary  residuals  and  related 
quantities  fro*  nonlinear  regression.  This  investigation  is  based  on  the  quadratic 
approximation  of  the  ordinary  residuals  developed  in  section  2.  In  section  3,  we  derive 
informative  expressions  for  the  expectation  and  variance  of  the  vector  of  ordinary 
residuals,  and  discuss  why  these  residuals  may  not  be  an  adequate  basis  for  diagnostics 
methods.  In  section  4,  we  propose  a  new  type  of  residual  for  use  in  nonlinear 
regression.  It  is  shown  that  these  naw  residuals  overcoma  many  of  the  failings  of  the 
ordinary  residuals  and  that  they  can  be  used  in  much  the  same  way  as  the  ordinary  reaivuala 
from  linear  regression.  In  the  remainder  of  this  section,  we  establish  notation  and 
briefly  review  relevant  background  material. 

The  standard  nonlinear  regraaslon  model  can  be  represented  as 

y^  m  f (x^,8)  +  c^i  i  ■  1,.,...,a  (I) 

where  xA  represents  a  vector  of  known  explanatory  variables  associated  with  the  i-th 
observable  response  yA,  fl  is  a  p  x  1  vector  of  unknown  parameters,  the  response 
function  f  is  assumed  to  be  known,  continuous  and  twice  differentiable  in  6,  and  the 

errora  ei  are  assumed  to  be  independent,  identically  distributed  normal  random  variables 

2  A 
with  mean  0  and  variance  3  .  For  this  model,  the  maximum  likelihood  estimator  0  of 

0  can  be  found  by  minimizing  the  objective  function 

n  , 

J(8)  -  l  (y  -  f(x  ,8>>  (2) 

i-1  1 

A 

Kennedy  and  Gentle  (1980)  discuss  computational  methods  for  obtaining  9»  for  our  purposes 

A  A 

we  assume  that  8  is  available.  The  asymptotic  behavior  of  8  is  investigated  by  Wu 

2 

(1981)  who  provides  additional  references.  The  usual  estimator  of  c  is 
s2  •  J(8)/(n  -  p)  . 

For  notational  convenience,  let  f ^  ■  f(x^,8),  i  *  1,2, ...,n,  and  lat  V  denote  the 
n  x  p  matrix  with  elements  f*  “  Jf^/38^,  i  ”  1,2, ...,n,  r  ”  1,2,.,.,p.  Unless 
indicated  otherwise,  all  derivatives  are  evaluated  at  the  true  parameter  values.  Various 
quadratic  expansions  used  in  the  following  sections  involve  the  p  *  p  matrices 


p.  These 


1  ■  1,2,..., n;  the  elements  of  W|  are  f"  “  r,a,  -  1,2,..., 

matrices  can  ha  wrlttan  convaniantly  in  an  n  *  p  *  p  array  M  (Bat as  and  Watta,  1980). 
Tha  kj-th  "column*  of  W  la  the  k  j-th  aacond  dari vativa  vactor  with  elements 
*  “  '«2,...,n,  while  the  i-th  face  of  »  ia  the  p  *  p  matrix  consisting  of  the  i 

th  elements  of  the  aacond  derivative  vactora. 


i  Accession  For 
NTISGRAAI 
D11C  TAB 

I  Unannounced 
1  justification. 


By - - - - 

Distribution/^ 


Avallabllitr^Code9_ 

livail  and/or 


□  □ 


2 


The  n  »  1  vector  of  ordinary  residuals  e  can  be  written  as 

e  -  Y  -  f(8)  <3) 

*  A 

where  Y  and  f(8)  are  n  *  1  vectors  with  elements  and  ffx^.B),  i  «  1,2,. ..,n, 

respectively.  The  vector  a  is  of  course  a  function  of  the  errors  c ^ ,  1  •  1,2, ...,n. 

To  investigate  the  properties  of  e,  we  use  the  quadratic  expansion  of  the  right  side  of 
(3)  obtained  by  ignoring  all  teras  that  involve  cubic  and  higher  powers  in  the  errors. 

This  aethod  of  approxiaation  is  closely  related  to  that  in  Cox  and  Snell  (1968),  Box  (1971) 
and  Clarke  (1980). 

flie  standard  quadratic  expansion  of  f(8)  about  the  true  value  0*  is 

f(8 )  «  f(6*)  +  V<8  -  8*)  +  j  (0  -  0*)tw(8  -  8*)  (4) 

where  W  is  the  n  x  p  x  p  array  with  i-th  face  w^,  i  -  1,2,..., n.  Multiplication 
involving  three  diaenaional  arrays  is  defined  as  in  Bates  and  Matts  (1980)  so  that  the 
third  tern  of  (4)  is  an  n  x  1  vector  with  eleaents  (8  -  8*)TMi(8  -  8*)/2, 
i  *  1,2, ...,n.  Substituting  (4)  into  (3)  we  obtain  the  initial  representation 

e  *  e  -  v*  -  i  ♦TM4  (5) 

where  for  notational  convenience  ♦  ■  8  -  8*.  Since  cubic  and  higher  powers  in  the  e^’a 
are  to  be  ignored,  the  standard  first-order  approxiaation  ♦  *  <VTV)-1vTe  (Cox  and  Snell, 

1968)  can  be  substituted  into  the  third  tara  of  (5): 

T  T  T  -1  T  — 1  T 

♦  W*  «  6  V(V  V)  M(V  V)  V  E  (6) 

To  evaluate  the  second  term  of  (S),  we  require  a  quadratic  approxiaation  of  0.  Such 
an  approxiaation  can  be  obtained  froa  the  quadratic  expansion  of  the  likelihood  equations 
about  the  true  value  8*  (Cox  and  Snell,  1968).  As  shown  in  the  Appendix,  this  yields 

V4  »  V(VTV)”1  l  (eTti)w1(vTv)”,vTe  -  -1  p1{etv(vtv)~1w(vtv)”1vte}  (7) 

where  P,  -  V(VTV)-1VT  is  the  projection  operator  for  the  column  space  of  V  and  is 

the  i-th  column  of  I  -  P^. 


-4- 


In  the  remainder  of  this  paper,  we  use  C(F)  to  Indicate  the  colunn  space  of  the 
matrix  F.  Thus  for  example,  the  tangent  plane  at  9*  is  the  affine  subspace 
f(8*)  +  C(V).  The  orthogonal  complement  of  C(F)  will  be  denoted  by  C' (F). 

Expressions  (6)  and  (7),  which  form  the  essential  ingredients  of  (5),  can  be  expressed 
sore  informatively  in  terms  of  the  QR-decoaiposition  V  -  QR  of  V.  Here  Q  is  an 
n  x  n  matrix  with  orthogonal  columns  and  RT  “  (LT, 0 )  where  I,  is  a  p  *  p, 
nonsingular,  upper  triangular  matrix.  Partition  Q  “  (U,N)  where  0  is  n  x  p.  The 

columns  of  U  form  an  orthonormal  basis  for  C(V)  and  the  columns  of  N  form  an 

orthonormal  basis  for  C'(V)  so  that  C(H)  “  C'(V).  In  terms  of  the  transformed 
coordinates  9  "  L(9  -  6*),  the  first  and  second  derivative  vectors  are  given  by  the 
columns  of  U  and  W  »  L-twl" 1 .  The  i-th  face  of  5  is  simply 

»1  -  iT^I.-1  (8) 

Using  the  QR-decompoa ition  to  simplify  (6)  and  (7),  and  substituting  the  resulting 
expressions  into  (5)  gives 

e  -  Nn  -  u  l  (eT*.)w  x  -  nnT( T^WT )  (9) 

i 

where  (xT,nT)  ”  QTc  is  the  vector  of  rotated  errors.  The  components  of  QTe  are,  of 
course,  independent  and  follow  the  same  distribution  as  that  assumed  for  e. 

Some  additional  discussion  of  (9)  should  prove  useful.  First,  the  ab-th  element  of 

the  p  x  p  matrix  B^  “  \  (cTl^)W^  is  e^NN^w^  “  where  w>fc  is  the  ab-th 

second  derivative  vector  in  the  7  coordinates  l.e.  w  .  is  the  eb-th  column  of  W.  This 

AD 

matrix  is  closely  related  to  the  effective  residual  curvature  matrix  B  described  in 
Hamilton,  Watts  and  Bates  (1982)i  B  is  obtained  from  B^  by  replacing  e  with  e. 
Second,  the  final  term  of  (9)  can  be  written  as 

knTCtTWt)  -  H(tTAt) 

-  T^r  (10) 

where  A  is  the  (n  -  p)  x  p  x  p  intrinsic  curvature  array  (Bates  and  Watts,  1980)  and 
w*  is  obtained  from  W  by  projecting  each  second  derivative  vector  onto  C(N).  Third, 


-5- 


it  is  easily  seen  that  the  approximation  given  in  (9)  is  invariant  under  parameter 

T 

transformations,  as  expected.  Finally,  the  first  term  Nn  “  KN  e  is  simply  the  standard 
linear  approximation. 

It  follows  from  the  above  discussion  that  e  can  be  expressed  informatively  as 

e=>Hn-OBT--l  tTW*t  (11) 

n  2 

In  terms  of  the  basis  D,  the  elements  of  -B^t  are  the  coordinates  of  the  projection 

1  T 

of  e  onto  C(V),  while  (n  -  -j  t  at)  contains  the  coordinates  in  the  basis  provided 
by  N  of  the  projection  of  e  onto  C'(V). 

Equation  (11)  gives  our  final  quadratic  approximation  of  e.  In  the  next  section,  we 
use  this  approximation  to  investigate  the  moments  of  e. 


1 

( 

i 

i 

l 

I 


3.  MOMENTS  OF  e 


Since  T  and  n  are  independent,  it  follows  immediately  from  (11)  that,  to  the 
degree  of  accuracy  provided  by  the  quadratic  approximation, 

Ee  -  -  j  E(tTMNt) 

■  -  y  NNTE(TT»t) 

-  NNTd  (12) 

where  d  is  an  n  x  1  vector  with  elements  -c2tr(N^)/2  «  -o2tr((VTV)  /2, 
i  »  1,2, ...,n.  The  vector  d  is  essentially  the  expected  difference  between  the  linear 
and  quadratic  approximations  of  f(9)  (see  eq ■  4)  so  that  Ee  ie  the  projection  of  this 
expected  difference  onto  C(N).  The  expectation  of  e  can  also  b«  expressed  in  terms  of 

Ai 


Ee 


(13) 


where  a^  is  an  (n  -  p)  x  1  vector  with  elements  ajli#  j  •  1,2,...,n  -  p,  and 
Sjii  is  the  i-th  diagonal  element  of  the  j-th  face  of  A.  The  reeults  in  equations  (12) 
and  (13)  agree  with  those  of  Cox  and  Snell  (1968)  for  the  epecial  case  of  modal  (1). 

From  the  discussion  at  the  end  of  section  2,  it  is  easily  seen  that  the  three  addends 
in  equation  (11)  are  uncorrelated  so  that 

Var(e)  -  Var(NH)  *  Var(UB  t)  +  -1  Var{TT5I,T) 

n  4 

-  NNTo2  +  U(EB  BT)UTo2  ♦  4  Var(TTW*T)  (14) 

n  n  4 

The  matrix  B^  can  be  written  as  ^  n^A^  where  Aj  is  the  1-th  face  of  A  and 

is  the  i-th  component  of  n,  i  «  1 ,2, . . .,n  -  p.  From  this  it  follows  inawdlately  that 

BIB  bJ)  -  02K  (15) 

o  n 

2  Ml  T  T  T 

where  I  ■  t  Aj.  Next,  Var(T  W  T)  -  N  Var(T  AT )N  and  the  ij-th  element  of  Var ( t 1 AT ) 

is  Cov(tTA^t ,tTA^t)  ”  2o*tr(A^Aj),  i,j  -  -  p.  Substituting  this  and  (15)  into 


(14)  yields 


(16) 


Var(e)  “  NNT02  +  UXUT04  +  j  NZNT04 
where  Z  ie  the  (n  -  p)  *  (n  -  p)  matrix  with  elements  tr(AjAj). 

Alternatively,  Z  can  be  expressed  as 

2  “  l  *ab*L> 

a  b 

From  this  and  the  forms  of  the  first  and  second  terms  in  (16),  we  see  that  Var(e)  is 
positive  semi-definite  so  that  the  standard  linear  approximation  will  underestimate  the 
variances  of  the  residuals.  The  amount  of  underestimation  depends  heavily  on  the  intrinsic 
curvature  array  A,  as  should  be  clear  from  an  inspection  of  (16),  However,  there  is  some 
doubt  about  the  usefulness  of  using  the  elements  of  A  as  indicators  of  the  adequacy  of 
the  linear  approximation.  In  terms  of  the  basis  H,  the  columns  of  A  contain  the 
coordinates  of  the  projections  of  the  second  derivative  vectors  w#b  onto  c'(V).  if  the 
basis  for  C'(V)  la  changed  A  will  change.  On  the  other  hand  (16)  is  invariant  under 
such  changes. 

In  linear  regression,  the  interpretation  of  the  standard  diagnostic  plot  of  the 
residuals  versus  the  fitted  values  depends  on  the  fact  that  the  plotted  quantities  are 
uncorrelated.  The  interpretation  of  the  corresponding  plot  in  nonlinear  regression  may  be 

A 

more  difficult  since  the  residuals  e  and  fitted  values  f(8)  are  generally  correlated. 
The  previous  development  allows  for  a  rather  straightforward  determination  of  the  nature  of 
the  dependence  between  e  and  f(8):  From  (11), 

f(5)  -  f (8*)  +  Ut  +  UB  t  +  4  iTWI,t  (18) 

n  2 

Using  (18),  (11)  and  the  sysswtry  of  the  error  distribution,  we  find  that 

Cov(e,f  (8) )  ■  -Var  (UB^t)  -  |  Vard^T)  (19) 

so  that 

Var(e)  -  NNTq2  -  Covfe,f(6))  (20) 

Thus,  the  covariances  between  the  corresponding  elements  of  e  and  f(8)  are  negative. 
These  covariances  will  be  small  when  the  linear  and  quadratic  approximations  of  Var(e) 
are  close. 


-8- 


The  results  of  this  section  clearly  Indicate  that  diagnostic  methods  based  on  the 
standard  linear  approximation  can  potentially  fail.  For  example,  an  ordinary  residual  may 
appear  to  be  unusually  large  because  its  expectation  differs  substantially  from  zero,  or 
because  the  linear  approximation  of  Var(e)  does  not  accurately  reflect  its  variance.  A 
plot  of  the  ordinary  residuals  may  exhibit  systematic  features  because  the  corresponding 
plot  of  Ee  exhibits  such  features,  or  because  Cov(e,f(0))  is  not  sufficiently  small. 
Generally,  unusual  characteristics  of  e  alone  are  not  sufficient  to  infer  a  failing  of 
the  model  or  data.  In  the  next  section  we  suggest  ways  to  overcome  the  apparent 
shortcomings  of  the  ordinary  residuals. 


4.  PROJECTED  RESIDUALS 

A  variety  of  useful  diagnostic  can  ba  obtained  by  projecting  a  onto  aalactad 
subspaces •  As  a  class  we  call  these  projected  residuals* 

Recall  f roei  (11)  that  the  difference  between  the  linear  and  quadratic  approx laatlona 
of  a  depends  on  UB^t  and  tVt.  These  terms  account  for  the  potential  problems  that 
may  be  encountered  In  diagnostic  analyses  based  on  linear  approximations.  Clearly,  OB^t 
is  in  C(U)  end  tTW1,t  is  In  ClSf*) ,  the  column  space  spanned  by 

a,b  ■  1,2,. ..,p,  which  is  a  subspace  of  C(N)  ”  C'tO).  Thus,  the  effect  of  these  terms 
can  be  removed  by  projecting  e  onto  C'(U,W *)  *  C'(B,»)  «  C*(V,W). 

Let  P12<  P1  and  *2/1  denote  the  projection  operators  for  C(U,W) ,  c(U)  and 
C(W  ),  respectively.  Projection  operators  for  orthogonal  aubspaces  will  be  indicated 
by  P* .  Then  P12  “  P1  +  f2/1  and 


P12a  ■  Pia  -  P2/1« 


C  "  P2/16 


(21) 


The  first  term  of  (21)  is  the  linear  approximation!  the  second  term  reflects  the  adjustment 
necessary  to  remove  the  quadratic  component  of  e.  If  the  columns  of  W*  are  "small*,  so 
that  the  second  derivatives  are  unimportant,  we  will  have  P^e  *  e  and  nothing  will  be 
loat  by  considering  P^2e.  0,1  other  hand,  if  the  second  derivatives  are  Important,  the 

adjustment  provided  by  (21)  will  be  important  also. 

The  projected  residuals  have  eeveral  useful  properties  in  common  with  the  residuals 
from  linear  regression.  First,  we  clearly  have  B(Pj2e)>  0.  Second,  the  projected 
residuals  and  the  fitted  values  are  uncorrelated,  nils  property  follows  since  P}2e 
depends  only  on  n  which  is  independent  of  t.  Finally, 

Var(P»2e)  -  P'jC2  (22) 

and 

E(eTP*2e)  -  o2tr(P'2)  .  (23) 


Free)  (22)  we  see  that  the  construction  of  Student! red  projected  residuals  is 


straightforward,  while  (23)  shows  how  to  construct  estimates  of  0  that  are  free  of  the 


bias  contributed  by  the  quadratic  terms. 

The  projected  residuals  overcome  many  of  the  shortcomings  of  the  ordinary  residuals 
and  can  be  interpreted  in  much  the  same  way  as  the  residuals  from  linear  regression.  For 
example,  suppose  that  the  response  function  is  off  by  a  term  g(0)  so  that  the  true 
response  function  is  f(8)  +  g(8 ) .  In  this  case  the  errors  become  e  +  g{0)  and  the 
projected  residuals  are 

P12e  “  p129(B)  +  P12e  ‘ 

As  in  linear  regression,  a  plot  of  PJ2e  against  the  explanatory  variables  associated  with 
g(0)  may  reveal  the  presence  of  the  systematic  component  *;2g<e>. 

A  potentir'  disadvantage  of  the  projected  residuals  is  that  there  is  no  longer  an 
exact  correspondence  between  residuals  and  observations.  There  is,  however,  an  approximate 
correspondence  between  the  projected  residuals  and  the  errors  in  roughly  the  same  way  that 
there  is  a  correspondence  between  the  ordinary  residuals  and  the  errors  in  linear 
regression.  Suppose,  for  example,  that  the  first  error  contains  an  outlier  of  magnitude 


0  so  that  g(0 )  “  6b1  where  b1  is  the  first  standard  basis  vector.  Then 


P12»  *  6Pi2b1  +  PUE  • 


As  in  linear  regression,  the  first  component  of  P  will  be  inflated  by  an  amount  that 


is  usually  in  excess  of  the  amount  that  the  remaining  residuals  are  inflated. 


5.  ILLUSTRATIONS 


For  our  first  illustration,  we  consider  the  class  of  partially  nonlinear  models  with 
response  functions  of  the  form 

f(0)  -  Xa  +  Sg(t>  (24) 

where  X  is  a  known  full  rank  n  *  (p  -  2)  matrix,  8T  “  (aT,6»Y)  and  8  and  Y  are 
scalars.  This  class  of  response  functions  occurs  often  in  practice  and  in  the  statistical 
literature.  In  particular,  (24)  allows  for  transformations  of  explanatory  variables  in 
linear  regression. 

For  the  response  function  described  by  (24)  it  is  easily  seen  that 


V  -  (X.gdfJ.Sg'ty)] 


(25) 


where  g  (Y)  is  the  n  *  1  vector  with  elements  iq^(y)/9y ,  i  •  1,2,...,n.  Further, 

1  2 

there  are  only  two  nonzero  second  derivative  vectors,  w.  »  g  (y)  and  w  ■  8g  (y ) 

PY  IT 

where  g2(Y)  has  elements  32gi(Y)/®Y2>  Thus, 


C(V,H)  -  CU.gtYl.g’tYhg2**)) 


(26) 


and 

p12e  ”  P1e  "  p2/1e 
“  P1*  -  p2/1* 

where  P}  is  the  projection  operator  for  C'(V)  and  P2/i  the  projection  operator  for 

T  2  2 

C(NN  g  (y>).  The  linear  approximation  will  work  well  whenever  g  (Y )  is  in  or  lies  close 

2 

to  C(V)i  that  is,  whenever  the  residuals  from  the  regression  of  g  (Y>  on  V  are 
sufficiently  small.  Otherwise,  the  adjustment  P2/I®  mill  he  important. 

This  condition  for  the  adequacy  of  the  linear  approximation  is  also  reflected  by  Ee 
given  by  (12)  and  Var(e)  given  by  (16)»  Evaluating  (12)  we  find 

Ee  -  -  |  6  Vart(Y>P’g2(Y>  (27) 

A  A 

where  Var^(Y)  is  the  large  sample  variance  of  y,  i.e,  the  appropriate  element  of 
T  '  -1  2 

(v  v)  <3  .  it  can  also  be  established  that  the  second  and  third  terms  of  (16)  will  be 
•nil  if  PJg2(Y)  is  small  (see  equation  17). 

Our  second  illustration,  which  is  primarily  numerical,  is  based  on  the  model 

f(x,9)  -  8,  +  82<x  -  84)  +  8j{(x  -  04)2  +  8S>1/2  (28) 


-12- 


and  data  aet  3  from  Ratkowaky  (1983,  Table  6.18>.  The  data  consist  of  27  observations  on 
radioactivity  counts  y  at  equally  spacad  ties  intervals,  x  •  1,2,..., 27.  In  this 
example,  all  derivatives  are  estimated  by  substituting  the  Maxima  likelihood  estimatee 
given  by  Ratkowaky  (1983,  Table  6.19)  for  unknown  paraaeters. 

For  this  model  and  data  set,  the  difference  between  the  linear  and  quadratic 

approxiaatlons  of  Var(e^)  is  small.  Generally,  the  quadratic  part  of  (16)  accounts  for 

only  about  3%  of  the  total  variance.  Since  the  contribution  of  the  quadretic  terns  is 

small,  the  linear  approximation  of  Var(e^)  with  o  «  e  «  .3095  was  used  to  construct  the 

*  *  1/2 

vector  S(e)  of  Studentired  ordinary  residuals  with  elements  e^/o(P* )^  .  Here  and  in 
the  remainder  of  this  discussion,  a  ’hat"  above  any  quantity  indicates  evaluation  at  6. 

For  reference,  a  scatter  plot  of  the  elasMnta  of  8(e)  versus  x  is  given  es  Figure 
1  and  an  index  plot  of  diag(F^)  versus  x  is  given  as  Figure  2.  The  corresponding  plot 
of  the  ordinary  residuals  is  similar  to  that  displayed  in  Figure  1.  Notice  from  Figure  2 
that  the  first  two  or  three  cases  in  addition  to  the  cases,  particularly  the  Vest,  that 
fall  on  the  plateau  of  the  response  function  will  have  reletively  large  Influences  on  the 
fitted  model. 

A  plot  of  He  versus  x  is  given  ae  Figure  3.  For  ease  of  Interpretation,  the 
eleawnts  of  Be  he  <-  been  scaled  in  the  same  way  as  the  elements  of  8(e).  The  plot  of 
the  unsealed  Be  versus  x  is  similar.  The  residual  expectations  are  clearly  patterned, 
although  their  expected  sixes  indicate  that  the  residual  biases  are  not  likely  to  play  a 
dominant  role  in  diagnostic  plots.  If  the  experimental  error  were  increased,  however, 
patterns  such  as  that  in  Figure  3  could  becoaw  extremely  important. 

We  next  turn  to  the  projected  residuals,  Fjj*.  The  difference  between  the  variances 
of  the  ordinary  and  projected  residuals  is  indicated  in  Figure  4  which  is  a  plot  of 

mm  m 

(dlag(P*  )-dlag(F^2>)  versus  x.  Figure  5  gives  a  plot  of  (8(e)  -  SCP^e))  versus  x, 
where  BfF^a)  is  the  vector  of  Studentised  projected  residuals  constructed  by  using  (22) 
and  (23)  («  *  .3141).  Again,  a  pattern  is  clearly  evident  and  the  largest  differences 
occur  on  the  plateau  of  the  response  fu-  <_  As  magnitudes  of  the  differences  ere,  of 

course ,  large  enough  to  produce  notic  'in  various  diagnostics.  For  example. 


the  largest  absolute  studentlsed  ordinary  residual  Is  8(e)22  ”  *2.2  and  the  largest 
absolute  Studentlsed  projected  residual  is  S(P|2*)22  -  -2.5,  which  reflects  an  increase 
in  the  chance  that  case  22  may  be  an  outlier. 

The  general  patterns  in  Figures  3  and  5  are  quite  siellar .  We  can  explain  this 
occurrence  with  a  heuristic  argument  that  also  serves  to  point  out  some  of  the  detail 
behind  this  example.  First,  the  plot  of  e  -  P^e  versus  x  is  very  similar  to  those  in 
Figures  3  and  5  so  that  we  can  consider  unsealed  rather  than  Studentlsed  residuals. 

Second,  from  (11)  and  (21) 


a  -  P 


12* 


12 


us  t  -  -1  T*B*r 
n  2 


(29) 


so  that  P^P^e  "  -0BnT  which  can  be  estimated  by  substituting  estimates  for  unknown 
parameters.  In  the  example  at  hand,  the  estimated  eleswnts  of  UB^t  are  small  relative  to 
those  of  P.,2*-  Next,  of  the  15  second  derivative  vectors,  12  are  in  C(V).  Of  the 
remaining  3  vectors  only  one  contributes  substantially  to  the  determination  of  P^e. 

Thus,  P12e  is  roughly  a  random  scalar  times  a  single  column  of  »*.  Since 
Be  -  s(P12e)  we  can  expect  Figures  3  and  5  to  look  similar. 


-19- 


6.  DISCUSSION 


There  la  clearly  a  close  relationship  between  the  results  of  this  paper  and  methods 
for  assessing  intrinsic  curvature  (Bates  and  Watts  1980).  As  expected,  we  have  found  that 
the  difference  between  the  ordinary  and  projected  reaiduala  is  negligible  when  the  maximum 
intrinsic  curvature  is  sufficiently  small.  Such  behavior  might  be  taken  as  justification 
for  using  the  maximum  intrinsic  curvature  as  a  diagnostic  to  indicate  when  the  difference 
between  the  ordinary  and  projected  residuals  is  likely  to  be  substantial.  However,  the 
intrinsic  curvature  and  the  projected  residuals  both  necessitate  the  often  tedious 
construction  of  the  second  derivative  vectors.  Once  these  vectors  are  available,  the 
construction  of  the  projected  residuals  is  straightforward  and  can  be  carried  out  in  most 
standard  regression  programs.  It  does  not  seem  sensible  to  rely  on  a  diagnostic  that  is  no 
easier  to  construct  than  the  quantity  of  primary  interest. 

In  our  experience,  the  projected  residuals  rarely  alter  the  patterns  of  ordinary 
residual  plots  in  a  way  that  completely  changes  our  interpretation,  although  we  see  no 
inherent  reason  why  such  drastic  changes  cannot  occur  with  some  frequency  in  particular 
applications.  On  the  other  hand,  summery  statistics  computed  from  residuals  often  change 
in  important  ways. 

Numerical  problems  may  be  encountered  during  the  construction  of  the  projected 
residuals  when  the  second  derivative  vectors  lie  close  to  C(V)  so  that  the  model  is 
essentially  linear.  For  example,  when  using  standard  regression  programs  to  cosqpute  the 
projected  residuals,  all  of  the  second  derivative  vectors  will  occasionally  be  deleted 
automatically  because  of  high  correlations  with  the  columns  of  V. 

Finally,  the  results  described  in  this  paper  rely  on  the  accuracy  of  the  various 
quadratic  approximations,  of  course.  In  principle,  all  results  can  be  extended  by  carring 
the  approximations  to  a  higher  order.  The  practical  advantages  of  such  extensions, 
however,  are  unclear. 


-20- 


APPFNDIX 


Derivation  of  iquatlon  (7) 

Let  L  *  L(8)  denote  the  log  likelihood  for  nodal  (1)  and  without  loaa  of  generality 
aaauae  that  o2  la  known.  Further,  let  Lj,  »  iL/96^,  Lr|  ■  3Lr/39(  and  let 
Lr(fc  “  3Lrs/39^.  The  quadratic  expanalon  of  the  likelihood  equatlona  Lr«e>  -  o. 
r  -  1,2, about  the  true  value  8*  la 

l  +  )  9  L  y  1  at  .  *  0  (A.l) 

a  a,t 


where  8k  la  the  k-th  component  of  ♦“(8-6*). 

n 

The  flrat  ten*  of  (A.l)  la  ainply  L  -  \  c.f./cr  ,  r  -  1,2, ...,p,  or  In  ouitrix 

1 

notation 


For  the  aecond  tent,  Lr> 


<V 


vTe/o2 


t*t“)/o2  so  that 


(A. 2) 


(A. 3) 


Next, 


rat 


\  <vr  - « -  'I*? 


-  fJffVo2 


Since  the  approximation  la  to  be  constructed  by  Ignoring  tenia  Involving  cubi>  •  and  higher 
powers  In  the  e^'s,  the  first  tera  of  l^st  ia  sat  to  zoro ■  This  gives  for 
r  ■  1, 2, . . .  ,p 


!  w 


e,t 


rat 


f  !  +  f‘,*t  + 


1  e,t 


11 


11 


\  i  ♦.V2f"fI  +  fIfP 

1  B,t 


or  In  aatrix  notation, 


-21- 


(A. 4) 


«2(  l  -  -f2  ):  (vT*)W  ♦  +  VT(*T*)} 

s,t  i 

where  v*  is  the  i-th  row  of  V. 

Finally,  substituting  (A. 2),  (A. 3)  and  (A. 4)  into  (A.1)  and  rearranging  tens  we  find 

that 

<vTv)*  »  vTe  +  l  (£i  -  v^4)w14  -  1  vT(*Tw*) 

or 

V4  »  H£  ♦  V(VTV)_1  l  (6l  -  V** )Wi4  -  1  «(♦%)  (A.S) 

Substituting  the  standard  linear  approximation  for  the  4‘a  on  the  right  side  of  (A.S) 
yields  equation  ( 7 ) . 


-22- 


REFERENCES 


[11  Bates,  D.  M.  and  Watts,  D.  G.  (1980).  "Relative  curvature  measures  of 
nonlinearity".  Journal  of  the  Royal  Statistical  Scoiety  B,  42,  1-25. 

[2]  Box,  M.  J.  (1971).  "Bias  in  nonlinear  estimamtion" .  Journal  of  the  Royax  Statistical 
Society  B,  32,  171-201. 

[31  Clarke,  G.  P.  Y.  (1980).  "Moments  of  the  least  squares  estimators  in  a  nonlinear 
regression  model".  Journal  of  the  Royal  Statistical  Society  B,  42,  227-237. 

[4]  Cook,  R.  D.  and  Weisberg,  S.  (1982).  Residuals  and  Influence  in  Regression.  Chapman 
and  Hall:  London. 

[5]  Cox,  D.  R.  and  Snell,  E.  J.  (1968).  "A  general  definition  of  residuals".  Journal  of 
the  Royal  Statistical  Society  B,  30,  248-275. 

[6]  Hamilton,  D.  C. ,  Watts,  D.  G.  and  Bates,  D.  M.  (1982).  “Accounting  for  intrinsic 
nonlinearity  in  nonlinear  regression  parameter  inference  regions".  Annals  of 
Statistics,  10,  386-393. 

[7]  Kennedy,  w.  and  Gentle,  J.  (1980).  Statistical  Computing.  Marcel  Dekker,  Inc.: 

New  York 

[8]  Ratkowsky,  D.  A.  (1983).  Nonlinear  Regression  Modeling.  Marcel  Dekker,  Inc.: 

New  York. 

[9]  wu,  C.  F.  (1981).  "Asymptotic  theory  of  nonlinear  least  squares  estimation".  Annals 
of  Statistics,  9,  501-513. 


RDC/CLT/ed 


-2  3- 


SECURITY  CLASSIFICATION  OF  THIS  PAOE  fWh«l  P«H  Entered) 


|  REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

S.  RECIPIENT'S  CATALOG  NUMBER 

<1 _ 

4.  TITLE  (ond  Subtitle) 

RESIDUALS  IN  NONLINEAR  REGRESSION 

4.  TYPE  OF  REPORT  0  PERIOD  COVERED 

Summary  Report  -  no  specific 
reporting  period 

S.  PERFORMING  ORO.  REPORT  NUMBER 

7.  AU  THORfcJ 

R.  D.  Cook  arid  C.  L.  Tsai 

s.  contract  or  grant  number^; 

DAAG29-80-C-00  41 

S.  PERFORMING  OROANIZATION  NAME  AND  ADDRESS 

Mathematics  Research  Center,  University  of 

610  Walnut  Street  Wisconsin 

Madison.  Wisconsin  53706 

to.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  0  WORK  UNIT  NUMBERS 

Work  Unit  Number  4  - 
Statistics  and 

Probability 

II.  CONTROLLING  OFFICE  NAME  AND  AOORESS 

U.  S.  Army  Research  Office 

P.O.  Box  12211 

Research  Triangle  Park,  North  Carolina  27709 

12.  REPORT  DATE 

January  1984 

IS.  number  of  pages 

23 

14.  MONITORING  AGENCY  NAME  4  AOOItESfflf  dlttoront  itom  Controlling  Otflco) 

■ 

IS.  SECURITY  CLASS,  (ot  thle  report) 

UNCLASSIFIED 

■n»-rww.unjr?.vjT.7TT<  111111111  mm 

I*.  DISTRIBUTION  STATEMENT  (at  thle  Report) 


Approved  for  public  release;  distribution  unlimited. 


17.  DISTRIBUTION  STATEMENT  (ot  the  ebetrect  entered  In  Block  30,  II  dliterent  from  Report) 


IS.  SUPPLEMENTARY  NOTES 


IS.  KEY  WOROS  ( Continue  on  rereree  eld e  It  neceeeerr  end  Identity  by  block  rum  bet) 

Diagnostics,  intrinsic  curvature  array,  nonlinear  regression,  residuals 


20.  ABSTRACT  (Continue  on  teeeree  eld*  It  neceeeery  and  Identity  by  block  number) 

We  employ  a  quadratic  expansion  to  investigate  the  behavior  of  the 
ordinary  residuals  in  nonlinear  regression.  In  particular,  we  derive 
quadratic  approximations  for  the  mean  and  variance  of  the  ordinary  residuals, 
and  the  covariances  between  the  ordinary  residuals  and  the  fitted  values. 

This  investigation  leads  to  the  conclusion  that  the  ordinary  residuals  can 
produce  misleading  results  when  used  in  diagnostic  methods  analogous  to  those 
for  linear  regression.  Consequently,  we  suggest  a  new  type  of  residual  that 
overcomes  many  of  the  potential  shortcomings  of  the  ordinary  residuals. 

DO  ,  1473  eOlTION  OF  I  NOV  ss  is  obsolete  UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PAOE  fWhc*l  Data  Entered) 


